1 Introduction 



The Lattice Boltzmann Method (LBM) was introduced in the late 80 's to cope 
with the two major drawbacks of Lattice Gas Cellular Automata (LGCA), i.e. 
statistical noise and exponential complexity of the updating rule governing the 
time evolution of the cellular automaton (Mc Namara 1988, Higuera & Jimenez 
1989, Benzi et al. 1992). Ever since, the LBM has undergone progressive re- 
finements which have brought it to the point where it can compete with most 
advanced computational fluid dynamics (CFD) methods for a wide variety of 
problems, ranging from fully developed, homogeneous incompressible turbu- 
lence, to low Reynolds number flows in porous media. 

However, when it comes to complex geometries such as those commonly encoun- 
tered in many engineering applications, for instance internal flows of automotive 
interest, LBM still lags significantly behind state-of-the art CFD techniques. 
This is due to the inability of LBM to accommodate any sort of non-uniformity 
in the spatial distribution of the mesh grid points. This limitation is a direct 
inheritance from LGCA, which are based upon a set of mono-energetic par- 
ticles (same speed amplitude for the various propagation directions) hopping 
synchronously, in lock-step mode, from site to site according to the direction of 
the discrete speeds. Since the discrete speeds must be the same at any lattice 
site, a uniform spatial lattice necessarily results. 

This is in a blatant contrast with the modern CFD methods which are gener- 
ally capable of accomodating fairly complex meshes. In the attempt to bridge 
this gap, a coarse-grained extension of the LBM has been recently introduced 
(Nannelli 1992). 

This extension, by borrowing standard ideas from the Finite Volume method, 
does provide a significant enhancement of the geometrical flexibility of LBM, 
although, for the sake of simplicity, it was restricted to two dimensional, carte- 
sian non-uniform grids. 

In this paper, the two dimensional restriction is lifted and a fully three dimen- 
sional coarse-grained LBM is developed. Before moving on to the details on 
how this is achieved, let us spend some remarks on a further reason why, we 
believe, the present study is warranted. 

It is argued (Boris 1989, Boris et al. 1992) that the increasing availability of 
parallel computers is pushing CFD towards a situation of diminishing returns 
in terms of trading off computational cost for accuracy. In other words, the 
question is whether it is more effective to increase the grid resolution using a 
low-order "lean" scheme, rather then striving to save memory using a high-order 
"heavy" scheme. 

The LB method is well positioned to attempt a contribution in this direction: it 
is a low-order explicit scheme (2nd order in space, first in time) which performs 
extremely well on virtually any parallel architecture. On the other hand, as it 
stands today, the LB method cannot compete with modern CFD methods in 
situations where non-uniform stretched mesh are required. In fact, the gap in 
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number of grid points is simply too huge and no argument of better parallel 
efficiency can really compensate for it. This prompts the need of developing 
extended LB schemes able to reduce the gap, if not close it altogether. 
The extension we shall be looking for, should be such as to achieve geometrical 
flexibility without compromising the outstanding amenability to parallel com- 
puting to any serious extent. This paper presents the first exploratory effort in 
this direction for flows of engineering interest. 



2 Short review of Lattice Boltzmann method 

The Lattice Boltzmann Equation reads as follows: 

b 

Mx + c i; t + 1) - /,(*, t) = J2 M/i - if ) (!) 
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where fi represent the probability of a particle to be moving along direction 
Ci, fly is the scattering matrix between state i and j and f^ q are the local 
equilibrium populations, expanded to second order in the flow speed u to retain 
convective effects. Here f eq is given by (repeated indices are summed upon): 

ceq _P A , 1 



fi q = - yl + -Ci, a u a + 2Qi, a pu a upj a,fi = x,y,z (2) 

where Qi, a p = Ci ya Ci y p — hS a /3, is the projector upon the i-th speeds, p = fi is 
the flow density, and u = 'YlnfiCi/ p is the hydrodynamic velocity. The discrete 
speeds Cj belong to a four-dimensional face centered hypercube (FCHC) defined 
by |cj| = y/2, and Ci tX + c itV + c^ z + Ci^ w = 2 (Frisch et al., 1987). 
The equation ([j]) can be regarded as an explicit finite-difference approximation 
to a model Boltzmann equation of a BGK type (Qian et al. 1992). Also one 
can prove the existence of a H-theorem which guarantees its numerical stability 
in the linear regime (Mach <C 1) provided the spectrum of f2y is confined to 
the strip — 2 < A < . As a result, the eigenvalue A can be tuned to minimize 
the viscosity according to the relation 

1.1 1. 

^ = -3 ( 2 + A } 

For further details see the recent review by Qian et al., (1995). 
The basic merits of LBE are: 

• Flexibility in the choice of the collision rules; 

• Flexibility in the handling of boundary conditions; 

• Ideal amenability to vector/parallel computing; 
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A serious drawback of LBE, as compared to advanced CFD solvers, relates 
to the constraint of operating on a uniform, regular mesh. This limitation is 
particularly offending for those engineering applications in which a selective dis- 
tribution of the spatial grid points is required in order to cluster the degrees of 
freedom there where needed on account of physical and geometrical demands. 
The main idea proposed in this paper is to overcome the limitation of the uni- 
form LB scheme based on the two-grid procedure (Nannelli 1992, Succi 1994) 
described in the next section. 



3 LB for a non-uniform grid 

Think of two different lattices, Lf and L c : Lf is a fine-grained uniform lattice 
corresponding to the usual LB scheme, L c , instead, is a non-uniform coarser 
lattice whose cells typically contain several nodes of Lf. The idea is to take the 
differential form of LB Dynamics: 

b 

dtfi + <k ■Vf i = J2 ton tfi ~f! q ) = ( 3 ) 
j'=i 

and apply a finite-volume procedure based upon integration of eq. (|3|) on each 
cell of the coarse grid L c . By straightforward use of Gauss theorem, we obtain 

(4) 

where 



dt 
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Here Fj is the mean population of the macrocell C, <&i the corresponding flux 
across the boundaries of C, and Qj is the rate of change of due to collisions 
occurring within the cell C. 

The expression (|j) represents a set of N c ordinary differential equations for the 
unknowns N c being the number of cells of the coarse grid L c . To close 
this system, we need to express the surface fluxes <&j in terms of the cell-values 
Fi. This calls for an appropriate interpolation procedure mapping the fine- 
grain distribution fi onto the coarse-grain distribution F%. Note that while fi 
is defined on the nodes of Lf, Fi is located on the centers of the cells of L c . In 
a formal sense, we can write: 

R : Ft -> fi (6) 
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where the reconstruction operator R is the (pseudo-)inverse of the averaging 
operator A: 

A:h^F t (7) 

operationally defined by eq. (||). 

Two reconstruction operators have been considered: a piece- wise constant (PWC) 
and a piece- wise linear (PWL). For the sake of simplicity hereafter we shall re- 
strict our attention to the case of a cartesian non-uniform grid stretched along 
the z-direction: 

Aa; — b x ■ a, Ay = b y ■ a, Az — h(z) ■ a (8) 

where a is the spacing of the uniform fine-grained grid. 

This extends previous non-uniform LB schemes in three respects: first, the 

scheme is three dimensional; second, the stretching factors b x , b y , h{z) need not 

be integers; third, the stretching factor along z need not be a constant. 

By using piece-wise constant interpolation for the collision operator (local in 

space) and piecewise linear for the streaming operator (first-order), we arrive 

at the following coarse-grained LB equation (Finite Volume LBE, FVLBE for 

short): 

N t b 

# = £ c i F " + E % - F f ) ( 9 ) 

i/=0 j 

where Fi is the mean popolation at time t + At, the index v denotes the macro- 
cell involved in the piecewice linear interpolation (i.e. Ff = Fi), and C" are the 
coarse-grained streaming coefficents. The scattering matrix is the same as 
given in eq. |l|, and F^ q is the same as eq. ||. The streaming coefficents C" are 
given in Appendix for propagation direction Ci = (0, 0, 1) and Cj = (1, 0, 1). 
Consistently with the intents stated in the introduction, the eq.|| achieves geo- 
metrical flexibility at a minimal price in terms of aptness to parallel computing. 
Geometrical flexibility is in charge of coarse-grained streaming coefficents C" , 
while almost ideal amenability to parallel computing is preserved because within 
our piecewise interpolation, the coarse-grained collision is still completely local 
in space. 

4 Turbulent Channel Flow simulations 

The FVLBE scheme described in the previous section has been tested for the 
case of three dimensional turbulent channel flow. This application is especially 
suited to our purpose, first because it represents the simple instance of a ge- 
ometry calling for a non-uniform mesh, and second in view of the wide body of 
avalaible literature. Two series of simulations have been performed: low resolu- 
tion (32 x 32 x 100) and moderate resolution (64 x 64 x 128). Let us begin by 
describing the former. Three series of low resolution runs have been performed 
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with varying factors b x , b y (see table 1). The mesh-distribution along z is given 
by the following 1 — 2—1 law, 

[ 1 for 1 < k < 25 
Az(k) = < 2 for 26 < k < 75 
[ 1 for 76 < k < 100 

corresponding to a channel height H = 150. The initial condition corresponds 
to a Poiseuille flow along x with average speed Uq = 0.35, perturbed with a 
multiperiodic divergence free velocity field. The molecular viscosity is v$ — 

0. 005, corresponding to a nominal Reynolds number Reo — UqH/2 x vq ~ 
6000. The first outcome of the numerical experiments is that fully developed 
turbulence is supported only within a restricted time window, lasting up to 5000, 
15000, 70000 time step for L3, L2, LI respectively. As a result only LI lends itself 
to a (partial) statistical analysis of the turbulent field. The analysis proceeds as 
follows. Based on consolidated wisdom, the velocity field in a turbulent channel 
flow is expressed as follows: 

where \ — 0.4 is the Von Karman constant, a typical turbulent velocity, 
and d is a calibration constant (d — 5.5 ± 0.5) (Landau). Here S = v /v* is the 
width of the "viscous sublayer" while for 5 < z we have the "inertial sublayer" . 
The average velocity profiles drawn from the numerical simulation are checked 
against eq. ( ^0|) to produce best fit values of v n ,v™, d n where the superscript n 
denotes "numerical simulation" . The consistency check consists of comparing 
v n with the input laminar viscosity i/q, t>* with the value provided by the wall 
stress tensor: =< u x (0)u z (0) >, and finally d n with the existing literature, 

1. e. d = 5.5 ± 0.5. The actual values of v n , v™, d n are derived from the slope of 
the linear u x versus z plot (v%/v), the slope of the u x versus log(z) plot ( 

) and the value of log(u x ) at z = 1 («*/x ' log(v*/v) + dv*). In order to assess 
the grid independence of the numerical results, three mesh-distributions along 
z have been examined: 
Lattice "A" 

1 for 1 < k < 24 
Az(k) = { 2 for 25 < k < 75 
1 for 76 < k < 100 



Lattice "B" 



Az(k) = < 



1 

from 
2 

from 
1 



1.05 to 1.95 



1.9 to 1.1 



for 1 < k < 14 
for 15 < k < 34 
for 35 < k < 65 
for 66 < k < 85 
for 86 < k < 100 
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Lattice "C 



(?) 



' 1 
2 



for 1 < fc < 37 
for fc = 38 



As(ifc) = < 3 



for 39 < fc < 62 
for fc = 63 



2 
1 



for 64 < fc < 100 



The time averaged velocity profiles u x {z) are shown in fig.??, fig.??, fig.??, 
while corresponding best-fit values are reported in table 2. The dashed line in 
fig. ??, corresponds to the analytical profiles, eq. (fio|), with the min & max 
values drawn from the numerical experiment. These min & max are obtained by 
interpolating the numerical data with a family of straight lines, and then taking 
the min & max slopes within this family. The reason for dealing with a family 
of straight lines instead of just one, is that there doesn't appear to be a unique 
choice for the set of numerical data to be included in the best fit procedure. 
Consequently, by reporting both min and max we intend to provide a measure 
of the statistical scatter. From these figures we see that the grids "A" and "B" 
are the best performers, while grid "C" is pretty out of target. No consistency 
check for v* is available for these simulations because the turbulence window is 
too narrow to allow the collection of a significant statistical sample for the wall 
stress tensor. In summary, grid A provides similar results as grid B, although 
slightly better in terms of effective viscosity. In either cases, the measured 
viscosity is more than twice the laminar one vq. This is the result of the non- 
uniform mesh which introduces sharp localized peaks of artificial viscosity in the 
vicinity of mesh size discontinuities (z= 32 in our case). This effect has been 
found to fade away as the grid resolution is increased (Amati 1994, Succi 1995). 
In consequence, moderate resolution runs have been performed using grid A. 

4.1 Moderate Resolution simulations 

These simulations have been performed on a 64 x 64 x 128 grid with scaling 
factors b x = 15, b y = 8. Mesh points along z have been distributed according to 
the following 1 — 2 — 1 law: 



The result is a physical channel of height H = 192, length L x — 960, and width 
L y = 512, i.e. pretty close to the one examined by Moin and coworkers (Moin 
k Kim 1980, Moin & Kim 1982, Rogers & Moin 1987, Kim et al. 1987, Jimenez 
& Moin 1991). The main outcome of these simulations is that turbulence is 
supported for the entire life span of the simulation, that is 2.4 x 10 5 time steps, 



Az(fc) 



1 for 1 < fc < 32 

2 for 33 < fc < 96 
1 for 97 < fc < 128 
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corresponding to about 90 transit time units L x /Uq. This is due to the fact that 
the channel length is now able to support streamwise rolls feeding cross-channel 
turbulence. These samples have been collected every 53 steps in the interval 
[10 5 ,2.4 x 10 5 ], thus yielding about 2600 profiles for statistical analysis. The 
results are shown in fig. ?? and fig. ??. The numerical best-fit values deduced 
from the analysis are as follows: 

v n = 0.013 ± 0.002, <= 0.013 ±0.001, d n = 6.5 ± 0.7 (11) 

As a first remark, we see that w" is quite close to the values provided by low 
resolution simulations. Also, we note that d n is within the error bars provided 
by the literature although somewhat (10— 20%) too large. Finally, since turbu- 
lence is sustained for a significant time-span, wall stress-tensor statistics is also 
available. This yields: 

v* = V< u x u z >\ z=0 ~ 0.012 (12) 

in a pretty good match with the values deduced from the velocity profiles. 
For the sake of a better comparison with the existing literature, rescaled data 
u + = u/v* and r + =< uv > /(v*) 2 are reported as a function of dimcnsionless 
units (z+ = z/S and (2z/H — 1)). These results are presented in fig. ?? and ?? 
in which the same quantities pertaining to other numerical and experimental 
results are also reported. From figure ?? we see that our mean flow compares 
rather well with existing data, although the overestimation of d is clearly visible. 
This points to a lack of resolution which prevents our simulation to attain suffi- 
ciently high Reynolds numbers. Note in fact that the thickness of the boundary 
layer 5 = v /v* is in our simulation just one lattice spacing wide. 
Such a consideration is indeed corroborated by the results shown in fig. ??. 
From this figure we see that while the stress tensor is correctly captured in the 
central region of the channel (whence the possibility to obtain a correct estimate 
of v*), the wall turbulence is definitely too low as compared with literature data. 
These moderate resolution runs suggest that the FVLBE scheme provides results 
within the errors bars of current CFD methods even though a better control of 
numerical diffusion is needed to make it more competitive. At this stage, it 
is therefore of interest to spend some comments on the issue of numerical effi- 
ciency. As pointed out in the introduction, FVLBE has been generated in order 
to extend the range of applicability of the Lattice Boltzmann method to non- 
uniform geometries while keeping optimal amenability to parallel computing. 
This is achieved at the expense of an increased compute density, i.e. floating 
point operation per grid point, because, at variance with LBE, the propagation 
step involves more than just a two-point stencil. The idea is to offset this ex- 
cess of computation per node by a substantial reduction of the number of grid 
points to be used in the simulation, which is made possible by the capability 
to compress/rarefy the spatial grid distribution. For the case in point, substan- 
tial grid savings should be planned along the streamwise (x) and spanwise (y) 
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direction where relatively long-wavelength structures are expected to arise as 
compared to cross-flow (z) eddies. For the present 64 x 64 x 128 simulation, 
each time step takes about 5 second CPU time on a IBM RISC/6000 mod. 580. 
This corresponds to about 10/is per grid point per step, to be compared with 
roughly 3[is taken by a uniform LBE. These figures reflect approximately the 
increase in the number of floating point operation per grid point: about 1000 
for FVLBE and 500 for LBE. This factor two is largely overcompensated by 
the much larger size accessible to FVLBE, i.e. L x — 960, L y = 512, H = 192, 
corresponding to a gain factor of 

512 960 192 

"64" • ^4" • 128 - 180 < 13 > 

i.e. about two orders of magnitude. Indeed the channel flow simulation pre- 
sented in this paper would be simply unfeasible with a plain LBE scheme, for 
the latter would take about 30000 CPU seconds per time step, and about 150 
Gbyte of storage! 

To date, the largest channel flow simulation we have been also able to per- 
form with a uniform LB scheme is a 432 x 144 x 288 (1.7 GB) corresponding 
to Re ~ 3000, using the 512 processor Quadrics Machine (Bartoloni ct al., 
1993) Although the parallel performance is exceedingly encouraging (parallel 
efficiency 54 vs. 64, as can be seen in fig.??, Amati et al., 1996). It is clear that 
parallel computing alone cannot make up for the overdemand of computational 
resources raised by uniform LB scheme. 

Our code is almost a factor ten faster than modern semi-implicit CFD methods 
(compare our 5 s/step on a 64 x 64 x 128 with a 25 s/step on a 32 x 64 x 97, see 
Orlandi, 1995) but the quality of the results is correspondingly less satisfatory. 
Both gaps are likely to close up once better interpolators are in place. At this 
stage, only aptness to parallel computing will make the difference. 



5 Conclusions 

By borrowing standard techniques from the finite volume method, a low order 
coarse-grain three dimensional LB scheme has been developed. 
This scheme basically preserves the outstanding amenability to parallel comput- 
ing of the uniform LB scheme, while giving access to a much larger Reynolds 
number class of flows. Actual numerical simulation do, however, reveal that 
the large-scales (the resolved ones) display less turbulent activity than expected 
on the basis of the nominal Reynolds number. This means that, while mark- 
ing a singificant stride forward with respect to the uniform scheme, the present 
coarse-grained LB still lags behind state-of-art CFD methods. 
A plausible explanation is that our low order interpolator (piecewise constant 
for collision operator, and piecewise linear for the streaming operator) does 
achieve locality (hence aptness to parallel computing) much at the expense of 
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accuracy. Future work shall then focus on the development of better interpo- 
lation schemes, possibly in the spirit of Monotone Interpolated Large Eddies 
Simulation (MILES). 

In principle there is no reason why the basic advantages of LBM, i.e. handy 
treatment of complex boundary conditions and outstanding amenability to par- 
allel computing, should not carry over into FVLBE. Should this be the case, 
FVLBE may represent a fairly competitive tool for the numerical investigation 
of inhomogenous turbulent flows on highly parallel machines. 
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A Appendix 



In this Appendix we report the explicit expressions of some representative 
streaming coefficients. All quantities are measured in units of the fine-grain 
uniform lattice, i.e. a = 1. Along the direction Cj = (0,0, 1) the evolution of F 
is given by: 
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F = aF + (3F 1 + 7 F 

where F 1 and F 11 are the population at the nearest and next-to-nearest left 
neighbors and a, (3 and 7 are the streaming coefficients. Their expression is: 



a(k) = 



1 - 



1 2(z+(k) - z{k)) 



h{k) 



7 (fc) 



h{k) h{k) h{k) + h{k-l) 

2(z+{k) - z(k)) 2(z+{k- 1) -z{k- 1)) 
h(k) + h(k-l) + h(k - 1) + h(k - 2) 

2(z+(k- 1) -z(k- 1)) 1 



(14) 

(15) 
(16) 



h(k - 1) + h(k - 2) h(k) 

here the coefficients depends on the index fc, z(k) is the z-coordinate of the 

center of the macrocell, and z + (k) = z(k) + h ^ k ^ 1 , 

For diagonal propagation, like c— (1, 0, 1), the coefficients are: 



F - ai F + fcFl + lt F» + 5 t Fl H + tiFf v + ^FY 



(17) 
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(b x + h(k)-l) 



b x h(k) 



+ 



b x h(k) 



2{z~{k)- z{k)) 
h(k) + h(k - 



h{k) - \ 
b x h(k) 



{x+(k)-x(k)) 



2 _|_ ^ x 2 

b x h(k) b x h(k) 



2(z-(fc)-z(fc)) 
h(k) + h(k - 1) 



+ 



l r 



b x h{k) 



2(z-(k + l)- z(k + l)) 



h(k + l) + h{k) 



i r 



7i 



b x~2 

b x h(k) 



2(z-(k + l)-z(k + l)) 



h(k + l) + h{k) 



h{k)-\ | h(k)-\ [>+(,) - ,•(/)) 
b x h(k) b x h(k) 



+ 



+ 



h{k) - \ 
b x h{k) 

h(k) - \ 
b x h(k) 



{x+(i + l)-x{i + l)) 



{x+(i + l)-x(i + l)) 



b x h(k) 
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